Quantum computing (QC) promises significant advantages on certain hard computational tasks over classical computers. However, current quantum hardware, also known as noisy intermediate-scale quantum computers (NISQ), are still unable to carry out computations faithfully mainly because of the lack of quantum error correction (QEC) capability. A significant amount of theoretical studies have provided various types of QEC codes; one of the notable topological codes is the surface code, and its features, such as the requirement of only nearest-neighboring two-qubit control gates and a large error threshold, make it a leading candidate for scalable quantum computation. Recent developments of machine learning (ML)-based techniques especially the reinforcement learning (RL) methods have been applied to the decoding problem and have already made certain progress. Nevertheless, the device noise pattern may change over time, making trained decoder models ineffective. In this paper, we propose a continual reinforcement learning method to address these decoding challenges. Specifically, we implement double deep Q-learning with probabilistic policy reuse (DDQN-PPR) model to learn surface code decoding strategies for quantum environments with varying noise patterns. Through numerical simulations, we show that the proposed DDQN-PPR model can significantly reduce the computational complexity. Moreover, increasing the number of trained policies can further improve the agent's performance. Our results open a way to build more capable RL agents which can leverage previously gained knowledge to tackle QEC challenges.
translated by 谷歌翻译
Recent developments in quantum computing and machine learning have propelled the interdisciplinary study of quantum machine learning. Sequential modeling is an important task with high scientific and commercial value. Existing VQC or QNN-based methods require significant computational resources to perform the gradient-based optimization of a larger number of quantum circuit parameters. The major drawback is that such quantum gradient calculation requires a large amount of circuit evaluation, posing challenges in current near-term quantum hardware and simulation software. In this work, we approach sequential modeling by applying a reservoir computing (RC) framework to quantum recurrent neural networks (QRNN-RC) that are based on classical RNN, LSTM and GRU. The main idea to this RC approach is that the QRNN with randomly initialized weights is treated as a dynamical system and only the final classical linear layer is trained. Our numerical simulations show that the QRNN-RC can reach results comparable to fully trained QRNN models for several function approximation and time series prediction tasks. Since the QRNN training complexity is significantly reduced, the proposed model trains notably faster. In this work we also compare to corresponding classical RNN-based RC implementations and show that the quantum version learns faster by requiring fewer training epochs in most cases. Our results demonstrate a new possibility to utilize quantum neural network for sequential modeling with greater quantum hardware efficiency, an important design consideration for noisy intermediate-scale quantum (NISQ) computers.
translated by 谷歌翻译
近年来,深入学习数据隐私的重要性取得了重大关注。在缺乏金融监管机构的监督时,申请深度学习时可能会遭受数据泄露。然而,金融领域几乎没有相对的研究,我们最好的知识。我们将谷歌提出的两位代表深度学习隐私框架应用于金融交易数据。我们设计了从原始研究中提出的几个不同参数的实验。此外,我们将谷歌和苹果公司的隐私程度推荐给更合理地估计结果。结果表明,DP-SGD比金融交易数据的展开框架更好。隐私和准确性之间的权衡在DP-SGD中低。隐私程度也符合实际情况。因此,我们可以通过精确度获得强大的隐私保障,以避免潜在的经济损失。
translated by 谷歌翻译
Quantum Computing在古典计算机上解决困难的计算任务的显着改进承诺。然而,为实际使用设计量子电路不是琐碎的目标,并且需要专家级知识。为了帮助这一努力,提出了一种基于机器学习的方法来构建量子电路架构。以前的作品已经证明,经典的深度加强学习(DRL)算法可以成功构建量子电路架构而没有编码的物理知识。但是,这些基于DRL的作品不完全在更换设备噪声中的设置,从而需要大量的培训资源来保持RL模型最新。考虑到这一点,我们持续学习,以提高算法的性能。在本文中,我们介绍了深度Q-Learning(PPR-DQL)框架的概率策略重用来解决这个电路设计挑战。通过通过各种噪声模式进行数值模拟,我们证明了具有PPR的RL代理能够找到量子栅极序列,以比从划痕训练的代理更快地生成双量标铃声状态。所提出的框架是一般的,可以应用于其他量子栅极合成或控制问题 - 包括量子器件的自动校准。
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译
As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is better, and then train a preference model from this dataset of AI preferences. We then train with RL using the preference model as the reward signal, i.e. we use 'RL from AI Feedback' (RLAIF). As a result we are able to train a harmless but non-evasive AI assistant that engages with harmful queries by explaining its objections to them. Both the SL and RL methods can leverage chain-of-thought style reasoning to improve the human-judged performance and transparency of AI decision making. These methods make it possible to control AI behavior more precisely and with far fewer human labels.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think about this problem, with a focus on how to turn it into one that can be productively studied empirically. We first present an experimental design centered on choosing tasks for which human specialists succeed but unaided humans and current general AI systems fail. We then present a proof-of-concept experiment following meant to demonstrate a key feature of this experimental design and show its viability with two question-answering tasks: MMLU and time-limited QuALITY. On these tasks, we find that human participants who interact with an unreliable large-language-model dialog assistant through chat -- a trivial baseline strategy for scalable oversight -- substantially outperform both the model alone and their own unaided performance. These results are an encouraging sign that scalable oversight will be tractable to study with present models and bolster recent findings that large language models can productively assist humans with difficult tasks.
translated by 谷歌翻译
在这项研究中,将放射学方法扩展到用于组织分类的光学荧光分子成像数据,称为“验光”。荧光分子成像正在出现在头颈部鳞状细胞癌(HNSCC)切除期间的精确手术引导。然而,肿瘤到正常的组织对比与靶分子表皮生长因子受体(EGFR)的异质表达的内在生理局限性混淆。验光学试图通过探测荧光传达的EGFR表达中的质地模式差异来改善肿瘤识别。从荧光图像样品中提取了总共1,472个标准化的验光特征。涉及支持矢量机分类器的监督机器学习管道接受了25个顶级功能的培训,这些功能由最小冗余最大相关标准选择。通过将切除组织的图像贴片分类为组织学确认的恶性肿瘤状态,将模型预测性能与荧光强度阈值方法进行了比较。与荧光强度阈值方法相比,验光方法在所有测试集样品中提供了一致的预测准确性(无剂量)(平均精度为89%vs. 81%; P = 0.0072)。改进的性能表明,将放射线学方法扩展到荧光分子成像数据为荧光引导手术中的癌症检测提供了有希望的图像分析技术。
translated by 谷歌翻译
我们介绍了NLP社区Metasurvey的结果。从2022年5月到2022年6月,该调查引起了关于有争议的问题的意见,包括该领域的行业影响,对AGI和道德规范的关注。我们的结果将具体数字置于几个争议中:例如,受访者几乎完全将有关人工通用智能的重要性的问题分为一半,语言模型是否理解语言以及语言结构的必要性以及解决NLP问题的必要性。此外,调查提出了元问题,要求受访者预测调查响应的分布。这不仅使我们不仅可以深入了解NLP研究人员所拥有的各种信念,还可以揭示社区预测与现实不符的错误社会学信念。我们在各种问题上发现这种不匹配。除其他结果外,社区大大高估了其对基准的实用性的信念,以及扩展解决现实世界中问题的潜力,同时低估了其对语言结构,归纳偏见和跨学科科学重要性的信念。
translated by 谷歌翻译